Intelligent Big Multimedia Databases by Andreas Wichert

Intelligent Big Multimedia Databases by Andreas Wichert

Author:Andreas Wichert [Wichert, Andreas]
Language: eng
Format: epub
Published: 2015-06-06T16:59:44+00:00


April 7, 2015 16:32

ws-book9x6

9665-main page 153

Low Dimensional Indexing

153

a lower bound and an upper bound, which is the distance from the query

point to the centroid minus the radius for the lower bound plus the radius

for the upper bound value. According to Böhm, SS-trees outperform the

R-trees [Böhm et al. (2001)].

Numerous tree structures, such as SR-trees, which can be regarded as

a variation of the R-tree, such as the SS-tree, R∗-trees, X-trees, TV-trees

use different heuristics to minimize or prevent overlap of the main regions

(for example MBRs).

6.3.4

High-dimensional space

Traditional tree-based methods employ the principle of hierarchical clus-

tering of the data space, in which metric properties are used to construct a

tree that can be used to prune branches while processing the queries. The

performance of tree-based methods deteriorates with large dimensionality.

To understand why these methods fail in high-dimensional spaces, let us as-

sume that our tree has only the hierarchy space. We group the images into

clusters represented by the cluster centers cj. After clustering, k centroids

c1, c2 · · · , ck

and k clusters

C1, C2 · · · , Ck

are present with

Cj = {x|d2(x, cj) = min d2(x, ct)}.

(6.21)

t

Each cluster Cj contains the points that are closest to the centroid cj.

During the categorization task of a given query vector y, the most similar

centroid ci is determined representing the group of similar vectors. Does

this simple model also work for query retrieval? Could we take advantage of

the grouping of the vectors into clusters? The idea would be to determine

the most similar centroid ci which represents the most similar category.

In the next step, we would search for the most similar vector according

to the -similarity only in this cluster Ci. By doing so, we could save

some considerable computation, see Figure 6.16. Suppose l = mini d(y, ci)

is the distance to the closest centroid and rmax is the maximal radius of

all the clusters. Only if l ≥ ≥ rmax, we are guaranteed to determine

similar vectors. Otherwise we have to analyze other clusters as well, Figures

6.17, 6.18 and Figure 6.19 indicate some counter examples for other cases



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.